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Computing the Internet Checksum 


Status of This Memo 


This memo summarizes techniques and algorithms for efficiently 
computing the Internet checksum. It is not a standard, but a set of 
useful implementation techniques. Distribution of this memo is 
unlimited. 


1. Introduction 


This memo discusses methods for efficiently computing the Internet 
checksum that is used by the standard Internet protocols IP, UDP, and 
TCP. 


An efficient checksum implementation is critical to good performance. 
As advances in implementation techniques streamline the rest of the 
protocol processing, the checksum computation becomes one of the 
limiting factors on TCP performance, for example. It is usually 
appropriate to carefully hand-craft the checksum routine, exploiting 
every machine-dependent trick possible; a fraction of a microsecond 
per TCP data byte can add up to a significant CPU time savings 
overall. 


In outline, the Internet checksum algorithm is very simple: 


(1) Adjacent octets to be checksummed are paired to form 16-bit 
integers, and the 1’s complement sum of these 16-bit integers is 
formed. 


(2) To generate a checksum, the checksum field itself is cleared, 
the 16-bit 1’s complement sum is computed over the octets 
concerned, and the 1’s complement of this sum is placed in the 
checksum field. 


(3) To check a checksum, the 1’s complement sum is computed over the 
same set of octets, including the checksum field. If the result 
is all 1 bits (-0O in 1’s complement arithmetic), the check 
succeeds. 


Suppose a checksum is to be computed over the sequence of octets 
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A, B, C, D, ... , Y, Z. Using the notation [a,b] for the 16-bit 
integer a*256+b, where a and b are bytes, then the 16-bit 1’s 
complement sum of these bytes is given by one of the following: 


(A,B) +" JEGpDY AY ego Re Eya] [1] 
[AFB] +f ECD F” sere bo [27-04 [2] 


where +’ indicates 1’s complement addition. These cases 
correspond to an even or odd count of bytes, respectively. 


On a 2’s complement machine, the 1’s complement sum must be 
computed by means of an "end around carry", i.e., any overflows 
from the most significant bits are added into the least 
significant bits. See the examples below. 


Section 2 explores the properties of this checksum that may be 
exploited to speed its calculation. Section 3 contains some 
numerical examples of the most important implementation 
techniques. Finally, Section 4 includes examples of specific 
algorithms for a variety of common CPU types. We are grateful 
to Van Jacobson and Charley Kline for their contribution of 
algorithms to this section. 


The properties of the Internet checksum were originally 
discussed by Bill Plummer in IEN-45, entitled "Checksum Function 
Design". Since IEN-45 has not been widely available, we include 
it as an extended appendix to this RFC. 
2. Calculating the Checksum 

This simple checksum has a number of wonderful mathematical 
properties that may be exploited to speed its calculation, as we 
will now discuss. 

(A) Commutative and Associative 
As long as the even/odd assignment of bytes is respected, the 
sum can be done in any order, and it can be arbitrarily split 
into groups. 
For example, the sum [1] could be split into: 


CABI obo LCDI RO a4 FE ETAT 4) 


Bey LOG Rae Baa A Zh) [3] 
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(B) Byte Order Independence 


The sum of 16-bit integers can be computed in either byte order. 
Thus, if we calculate the swapped sum: 


[B,A] +” [D,C] +’ ... +" [24,¥] [4] 
the result is the same as [1], except the bytes are swapped in 


the sum! To see why this is so, observe that in both orders the 
carries are the same: from bit 15 to bit 0 and from bit 7 to bit 


8. In other words, consistently swapping bytes simply rotates 
the bits within the sum, but does not affect their internal 
ordering. 


Therefore, the sum may be calculated in exactly the same way 
regardless of the byte order ("big-endian" or "little-endian") 
of the underlaying hardware. For example, assume a "little- 
endian" machine summing data that is stored in memory in network 
("big-endian") order. Fetching each 16-bit word will swap 
bytes, resulting in the sum [4]; however, storing the result 
back into memory will swap the sum back into network byte order. 


Byte swapping may also be used explicitly to handle boundary 
alignment problems. For example, the second group in [3] can be 
calculated without concern to its odd/even origin, as: 


[K,L] +’ ... +’ [Z,0] 


if this sum is byte-swapped before it is added to the first 
group. See the example below. 


(C) Parallel Summation 


On machines that have word-sizes that are multiples of 16 bits, 
it is possible to develop even more efficient implementations. 
Because addition is associative, we do not have to sum the 
integers in the order they appear in the message. Instead we 
can add them in "parallel" by exploiting the larger word size. 


To compute the checksum in parallel, simply do a 1’s complement 
addition of the message using the native word size of the 
machine. For example, on a 32-bit machine we can add 4 bytes at 
a time: [A,B,C,D]+’... When the sum has been computed, we "fold" 
the long sum into 16 bits by adding the 16-bit segments. Each 
16-bit addition may produce new end-around carries that must be 
added. 


Furthermore, again the byte order does not matter; we could 
instead sum 32-bit words: [D,C,B,A]+’... or [B,A,D,C]+’... and 
then swap the bytes of the final 16-bit sum as necessary. See 
the examples below. Any permutation is allowed that collects 
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all the even-numbered data bytes into one sum byte and the odd- 
numbered data bytes into the other sum byte. 


There are further coding techniques that can be exploited to speed up 
the checksum calculation. 


(1) Deferred Carries 


Depending upon the machine, it may be more efficient to defer 
adding end-around carries until the main summation loop is 
finished. 


One approach is to sum 16-bit words in a 32-bit accumulator, so 
the overflows build up in the high-order 16 bits. This approach 
typically avoids a carry-sensing instruction but requires twice 
as many additions as would adding 32-bit segments; which is 
faster depends upon the detailed hardware architecture. 


(2) Unwinding Loops 


To reduce the loop overhead, it is often useful to "unwind" the 
inner sum loop, replicating a series of addition commands within 


one loop traversal. This technique often provides significant 
savings, although it may complicate the logic of the program 
considerably. 


(3) Combine with Data Copying 


Like checksumming, copying data from one memory location to 
another involves per-byte overhead. In both cases, the 
bottleneck is essentially the memory bus, i.e., how fast the 
data can be fetched. On some machines (especially relatively 
slow and simple micro-computers), overhead can be significantly 
reduced by combining memory-to-memory copy and the checksumming, 
fetching the data only once for both. 


(4) Incremental Update 


Finally, one can sometimes avoid recomputing the entire checksum 
when one header field is updated. The best-known example is a 
gateway changing the TTL field in the IP header, but there are 
other examples (for example, when updating a source route). In 
these cases it is possible to update the checksum without 
scanning the message or datagram. 


To update the checksum, simply add the differences of the 
sixteen bit integers that have been changed. To see why this 
works, observe that every 16-bit integer has an additive inverse 
and that addition is associative. From this it follows that 
given the original value m, the new value m’, and the old 
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the new checksum C’ is: 
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We now present explicit examples of calculating a simple 1’s 


complement sum on a 2’s complement machine. 
same sum calculated byte by bye, 


swapped order, 
numbers are in hex. 


Byte-by-byte 


Byte 0/1: 00 01 
Byte 2/3: f2 03 
Byte 4/5: f4 £5 
Byte 6/7: £6 £7 
Sum1: 2dc 1£f0 
dc fO 
Carrys: t 2 
Sum2: dd f2 
Final Swap: dd f2 
Byte 0/1/2/3: 0001£203 
Byte 4/5/6/7: f4f5f6f7 
Sum1: Of4f7e8fa 
Carries: 0 
Top half: f4f7 
Bottom half: e8fa 
Sum2: 1ddf1 
ddf1 
Carrys: 1 
Sum3: ddf2 
Final Swap: ddf2 
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"Normal" 
Order 


010003f2 
f5f4f7f6 


The examples show the 


by 16-bits words in normal and 
and 32 bits at a time in 3 different orders. 


All 


Swapped 
Order 


03£20100 
f7f6f5f4 
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here an example of breaking the sum into two groups, 
the second group starting on a odd boundary: 


Byte-by-byte 


01 
(00) 


01 


f4 
f6 
(00) 


Byte 0/1: 00 
Byte 2/ : f2 
Sum1: f2 
Byte 4/5: 03 
Byte 6/7: £5 
Byte 8/: £7 
Sum2: 
Sum2: 
Carry: 
Sum3: 
Suml: 


Sum3 byte swapped: 


Sum4: 


Sum4: 
Carry: 


Sum5: 
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Normal 
Order 
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4. Implementation Examples 


In this section we show examples of Internet checksum implementation 
algorithms that have been found to be efficient on a variety of 
CPU’s. In each case, we show the core of the algorithm, without 
including environmental code (e.g., subroutine linkages) or special- 
case code. 


Ae 7 nton 


The following "C" code algorithm computes the checksum with an inner 
loop that sums 16-bits at a time in a 32-bit accumulator. 


in 6 
{ 
/* Compute Internet Checksum for "count" bytes 
* beginning at location "addr". 
*/ 


register long sum = 0; 


while( count > 1 ) { 
/* This is the inner loop */ 
sum += * (unsigned short) addr++; 
count -= 2; 


/* Add left-over byte, if any */ 
if( count > 0 ) 
sum += * (unsigned char *) addr; 


/* Fold 32-bit sum to 16 bits */ 
while (sum>>16) 


sum = (sum & Oxffff) + (sum >> 16); 


checksum = “sum; 
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4.2 Motorola 68020 


The following algorithm is given in assembler language for a Motorola 
68020 chip. This algorithm performs the sum 32 bits at a time, and 
unrolls the loop with 16 replications. For clarity, we have omitted 
the logic to add the last fullword when the length is not a multiple 
of 4. The result is left in register d0. 


With a 20MHz clock, this routine was measured at 134 usec/kB summing 
random data. This algorithm was developed by Van Jacobson. 


movl d1,d2 
lsrl #6,d1 | count/64 = # loop traversals 
andl #0x3c, d2 | Then find fractions of a chunk 
negl d2 
andb #0xf, cc | Clear X (extended carry flag) 
jmp pcl (2$-.-2:b,d2) | Jump into loop 

1$: | Begin inner loop... 
movl a0@+, d2 Fetch 32-bit word 


| Add word + previous carry 
movl a0@+, d2 | Fetch 32-bit word 
| Add word + previous carry 


| ... 14 more replications 


2$: 
dbra d1, 1$ | (NB- dbra doesn’t affect X) 
movl do,dl | Fold 32 bit sum to 16 bits 
swap dl | (NB- swap doesn’t affect X) 
addxw d1,qdo 
jec 3$ 
addw #1,d0 

3S: 
andl #O0xffff, dod 
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4.3 Cray 


The following example, 
contributed by Charley Kline. 
as a vector operation, 
summation unit of 32 bits. 
do with short blocks, 
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in assembler language for a Cray CPU, 
It implements the checksum calculation 
summing up to 512 bytes at a time with a basic 
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was 


This example omits many details having to 


for clarity. 


Register Al holds the address of a 512-byte block of memory to 
checksum. First two copies of the data are loaded into two vector 


registers. 


per clock cycle. 


One is vector-shifted right 32 bits, 
vector-ANDed with a 32 bit mask. 
together. Since all these operations chain, 
Then it collapses the result vector in a loop that 


while the other is 
Then the two vectors are added 
it produces one result 


adds each element to a scalar register. Finally, the end-around 
carry is performed and the result is folded to 16-bits. 


EBM 
AO 
VL 
S1 
A2 
vi 
V2 
v3 
V1 
A2 
S1 
S4 
A4 
CK$LOOP S2 
A2 
AO 
S1 
JAN 
S2 
S1 
S1 
S2 
S1 
S1 
S1 
CMR 
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A1 

64 
<32 

32 
,A0,1 
S1&V1 
V1>A2 
V2+V3 
63 

0 

<16 
16 
v1,A2 
A2-1 
A2 
$1+S2 
CKSLOOP 
S1&S4 
S1>A4 
S$1+S2 
S1&S4 
S1>A4 
$1+S2 
#S1 


At this point, 


& Partridge 


use full vectors 
form 32-bit mask from the right. 


load packet into V1 
Form right-hand 32-bits in V2. 
Form left-hand 32-bits in V3. 


Add 


the two together. 


Prepare to collapse into a scalar. 


Form 16-bit mask from the right. 


Form 
Form 


Form 
Form 


right-hand 16-bits in S2 
left-hand 16-bits in S1 


right-hand 16-bits in S2 
left-hand 16-bits in S1 


Take one’s complement 


S1 contains the checksum. 
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4.4 


The following example, 
the data 4 bytes at a time. 
to add the last fullword when the length is not a multiple of 4, 
to reverse the bytes when necessary. 


IBM 370 


RCARRY. 
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For clarity, 


in assembler language for an IBM 370 CPU, 
we have omitted the logic 
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sums 


and 


The result is left in register 


This code has been timed on an IBM 3090 CPU at 27 usec/KB when 


summing all one bits. 
trouble is taken to word-align the addends 
at both the beginning and the end, 


to compensate for starting on an odd byte). 


* 
* 
* 
x 
* 
* 


CHECKSUM 


LOOP 


Braden, 


This time is reduced to 24.3 usec/KB if the 
(requiring special cases 
and byte-swapping when necessary 


Registers RADDR and RCOUNT contain the address and length of 


(RCARRY, 
(RCOUNT, 


SR 
SR 
LA 


RSUM) 
RMOD) 


RSUM, RSUM 
RCARRY, RCARRY 
RONE, 1 


RCOUNT, 6 
RCOUNT, RONE 
RMOD, 26 
RADDR, R3 
RADDIR, =F (64) 
RMOD, 1 


RMOD, DOPEVEC9 (RMOD) 


LOOP (RMOD) 


Inner loop: 


BCT 


Add Carries into sum, 


ALR 
BC 


Borman, 


RSUM, 0 (, RADDR) 
12,*+6 

RCARRY, RONE 
RSUM, 4 (, RADDR) 
12,*+6 

RCARRY, RONE 


the block to be checksummed. 


must be an even/odd register pair. 
must be an even/odd register pair. 


Clear working registers. 


Set up constant 1. 


Count/64 to RCOUNT. 
+1 = 


# times in loop. 


Size of partial chunk to RMOD. 

Adjust addr to compensate for 
jumping into the loop. 

(RMOD/4)*2 is halfword index. 


Use magic dope-vector for offset, 


and jump into the loop... 


Add Logical fullword 
Branch if no carry 
Add 1 end-around 

Add Logical fullword 

Branch if no carry 

Add 1 end-around 


14 more replications 


RADDR, =F’ 64’ 
RCOUNT, LOOP 


RCARRY, RSUM 
12) *+6 


& Partridge 


Increment address ptr 
Branch on Count 


and fold to 16 bits 


Add SUM and CARRY words 
and take care of carry 
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RCARRY, RONE 

RCARRY, 16 Fold 32-bit sum into 
RSUM, 16 16-bits 

RCARRY, RSUM 

RCARRY, =X' 0000FFFF” and take care of any 
DONE last carry 
RCARRY, =X’ OOOOFFFF’ 

RCARRY, =X' 0000FFFF” 1’s complement 


1988 
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Internet Experiment Note 45 5 June 1978 
TCP Checksum Function Design William W. Plummer 
Ly Introduction 

Checksums are included in packets in order that errors 


encountered during transmission may be detected. For Internet 
protocols such as TCP [1,9] this is especially important because 
packets may have to cross wireless networks such as the Packet 
Radio Network [2] and Atlantic Satellite Network [3] where 
packets may be corrupted. Internet protocols (e.g., those for 
real time speech transmission) can tolerate a certain level of 
transmission errors and forward error correction techniques or 
possibly no checksum at all might be better. The focus in this 
paper is on checksum functions for protocols such as TCP where 
the required reliable delivery is achieved by retransmission. 


Even if the checksum appears good on a message which has been 
received, the message may still contain an undetected error. The 
probability of this is bounded by 2**(-C) where C is the number 
of checksum bits. Errors can arise from hardware (and software) 
malfunctions as well as transmission errors. Hardware induced 
errors are usually manifested in certain well known ways and it 
is desirable to account for this in the design of the checksum 
function. Ideally no error of the "common hardware failure" type 
would go undetected. 


An example of a failure that the current checksum function 
handles successfully is picking up a bit in the network interface 
(or I/O buss, memory channel, etc.). This will always render the 
checksum bad. For an example of how the current function is 
inadequate, assume that a control signal stops functioning in the 
network interface and the interface stores zeros in place of the 
real data. These "all zero" messages appear to have valid 
checksums. Noise on the "There’s Your Bit" line of the ARPANET 
Interface [4] may go undetected because the extra bits input may 
cause the checksum to be perturbed (i.e., shifted) in the same 
way as the data was. 


Although messages containing undetected errors will occasionally 
be passed to higher levels of protocol, it is likely that they 


will not make sense at that level. In the case of TCP most such 
messages will be ignored, but some could cause a connection to be 
aborted. Garbled data could be viewed as a problem for a layer 


of protocol above TCP which itself may have a checksuming scheme. 


This paper is the first step in design of a new checksum function 


for TCP and some other Internet protocols. Several useful 
properties of the current function are identified. If possible 
- 1 — 
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TCP Checksum Function Design William W. Plummer 
these should be retained in any new function. A number of 


plausible checksum schemes are investigated. Of these only the 
"product code" seems to be simple enough for consideration. 


2. The Current TCP Checksum Function 


The current function is oriented towards sixteen-bit machines 
such as the PDP-11 but can be computed easily on other machines 


(e.g., PDP-10). A packet is thought of as a string of 16-bit 
bytes and the checksum function is the one’s complement sum (add 
with end-around carry) of those bytes. It is the one’s 
complement of this sum which is stored in the checksum field of 
the TCP header. Before computing the checksum value, the sender 
places a zero in the checksum field of the packet. If the 


checksum value computed by a receiver of the packet is zero, the 
packet is assumed to be valid. This is a consequence of the 
"negative" number in the checksum field exactly cancelling the 
contribution of the rest of the packet. 


Ignoring the difficulty of actually evaluating the checksum 
function for a given packet, the way of using the checksum 
described above is quite simple, but it assumes some properties 
of the checksum operator (one’s complement addition, "+" in what 


follows): 

(P1) + is commutative. Thus, the order in which 
the 16-bit bytes are "added" together is 
unimportant. 

(P2) + has at least one identity element (The 
current function has two: +0 and -0). This 
allows the sender to compute the checksum 
function by placing a zero in the packet checksum 
field before computing the value. 

(P3) + has an inverse. Thus, the receiver may 
evaluate the checksum function and expect a zero. 

(P4) + is associative, allowing the checksum field 


to be anywhere in the packet and the 16-bit bytes 
to be scanned sequentially. 


Mathematically, these properties of the binary operation "+" over 


the set of 16-bit numbers forms an Abelian group [5]. Of course, 

there are many Abelian groups but not all would be satisfactory 

for use as checksum operators. (Another operator readily 
-2- 
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available in the PDP-11 instruction set that has all of these 
properties is exclusive-OR, but XOR is unsatisfactory for other 
reasons.) 


Albeit imprecise, another property which must be preserved in any 
future checksum scheme is: 


(P5) + is fast to compute on a variety of machines 
with limited storage requirements. 


The current function is quite good in this respect. On the 
PDP-11 the inner loop looks like: 


LOOP: ADD (R1)+,RO ; Add the next 16-bit byte 
ADC RO ; Make carry be end-around 
SOB R2, LOOP ; Loop over entire packet. 


( 4 memory cycles per 16-bit byte ) 


On the PDP-10 properties P1-4 are exploited further and two 
16-bit bytes per loop are processed: 


LOOP: ILDB THIS, PTR ; Get 2 16-bit bytes 
ADD SUM, THIS ; Add into current sum 
JUMPGE SUM,CHKSU2 ; Jump if fewer than 8 carries 
LDB THIS, [POINT 20,SUM,19] ; Get left 16 and carries 
ANDI SUM,177777 ; Save just low 16 here 
ADD SUM, THIS ; Fold in carries 

CHKSU2: SOJG COUNT,LOOP ; Loop over entire packet 


( 3.1 memory cycles per 16-bit byte ) 


The "extra" instruction in the loops above are required to 
convert the two’s complement ADD instruction(s) into a one’s 
complement add by making the carries be end-around. One’s 
complement arithmetic is better than two’s complement because it 
is equally sensitive to errors in all bit positions. If two’s 
complement addition were used, an even number of 1’s could be 
dropped (or picked up) in the most significant bit channel 
without affecting the value of the checksum. It is just this 
property that makes some sort of addition preferable to a simple 
exclusive-OR which is frequently used but permits an even number 
of drops (pick ups) in any bit channel. RIM10B paper tape format 
used on PDP-10s [10] uses two’s complement add because space for 
the loader program is extremely limited. 


-3- 
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Another property of the current checksum scheme is: 


(P6) Adding the checksum to a packet does not change 
the information bytes. Peterson [6] calls this a 
"systematic" code. 


This property allows intermediate computers such as gateway 
machines to act on fields (i.e., the Internet Destination 
Address) without having to first decode the packet. Cyclical 
Redundancy Checks used for error correction are not systematic 
either. However, most applications of CRCS tend to emphasize 
error detection rather than correction and consequently can send 
the message unchanged, with the CRC check bits being appended to 


the end. The 24-bit CRC used by ARPANET IMPs and Very Distant 
Host Interfaces [4] and the ANSI standards for 800 and 6250 bits 
per inch magnetic tapes (described in [11]) use this mode. 


Note that the operation of higher level protocols are not (by 
design) affected by anything that may be done by a gateway acting 
on possibly invalid packets. It is permissible for gateways to 
validate the checksum on incoming packets, but in general 
gateways will not know how to do this if the checksum is a 
protocol-specific feature. 


A final property of the current checksum scheme which is actually 
a consequence of Pl and P4 is: 


(P7) The checksum may be incrementally modified. 


This property permits an intermediate gateway to add information 
to a packet, for instance a timestamp, and "add" an appropriate 
change to the checksum field of the packet. Note that the 
checksum will still be end-to-end since it was not fully 


recomputed. 

3% Product Codes 

Certain "product codes" are potentially useful for checksuming 
purposes. The following is a brief description of product codes 


in the context of TCP. More general treatment can be found in 
Avizienis [7] and probably other more recent works. 


The basic concept of this coding is that the message (packet) to 
be sent is formed by transforming the original source message and 
adding some "check" bits. By reading this and applying a 
(possibly different) transformation, a receiver can reconstruct 


- 4 - 
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the original message and determine if it has been corrupted 
during transmission. 


check plus "valid" flag 


Original Sent Reconstructed 


With product codes the transformation is Ms = K * Mo . That is, 
the message sent is simply the product of the original message 
Mo and some well known constant K . To decode, the received 
Ms is divided by K which will yield Mr as the quotient and 
0 as the remainder if Mr is to be considered the same as Mo 


The first problem is selecting a "good" value for K, the "check 
factor". K must be relatively prime to the base chosen to 
express the message. (Example: Binary messages with K 
incorrectly chosen to be 8. This means that Ms looks exactly 
like Mo except that three zeros have been appended. The only 
way the message could look bad to a receiver dividing by 8 is if 
the error occurred in one of those three bits.) 


For TCP the base R will be chosen to be 2**16. That is, every 
16-bit byte (word on the PDP-11) will be considered as a digit of 


a big number and that number is the message. Thus, 
Mo = SIGMA [ Bi * (R**i)] ' Bi is i-th byte 
i=0 to N 


Ms = K * Mo 


Corrupting a single digit of Ms will yield Ms’ = Ms +or- 
C* (R**j) for some radix position j . The receiver will compute 
Ms’ /K = Mo +or- C(R**j)/K. Since R and K are relatively prime, 
C* (R**j) cannot be any exact multiple of K. Therefore, the 
division will result in a non-zero remainder which indicates that 
Ms’ is a corrupted version of Ms. As will be seen, a good 
choice for K is (R**b - 1), for some b which is the "check 
length" which controls the degree of detection to be had for 
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burst errors which affect a string of digits (i.e., 16-bit bytes) 
in the message. In fact b will be chosen to be 1, so K will 
be 2**16 - 1 so that arithmetic operations will be simple. This 
means that all bursts of 15 or fewer bits will be detected. 
According to [7] this choice for b results in the following 
expression for the fraction of undetected weight 2 errors: 


f = 16(k-1)/[32(16k-3) + (6/k)] where k is the message length. 


For large messages f approaches 3.125 per cent as k goes to 
infinity. 


Multiple precision multiplication and division are normally quite 
complex operations, especially on small machines which typically 


lack even single precision multiply and divide operations. The 
exception to this is exactly the case being dealt with here -- 
the factor is 2**16 - 1 on machines with a word length of 16 
bits. The reason for this is due to the following identity: 

Q* (R** j) = Q, mod (R-1) 0 <=Q<R 
That is, any digit Q in the selected radix (0, 1, ee.  ‘R=L) 


multiplied by any power of the radix will have a remainder of Q 
when divided by the radix minus 1. 


Example: In decimal R = 10. Pick Q= 6. 


6 = 0o* 99 + 6 = 6, mod 9 
60 = 6* 9 + 6 = 6, mod 9 
600 = 66 * 9 + 6 = 6, mod 9 etc. 
More to the point, rem(31415/9) = rem((30000+1000+400+10+5) /9) 


= (3 mod 9) + (1 mod 9) + (4 mod 9) + (1 mod 9) + (5 mod 9) 
(3+1+4+1+5) mod 9 

= 14 mod 9 

= 5 


So, the remainder of a number divided by the radix minus one can 
be found by simply summing the digits of the number. Since the 
radix in the TCP case has been chosen to be 2**16 and the check 


factor is 2**16 - 1, a message can quickly be checked by summing 
all of the 16-bit words (on a PDP-11), with carries being 
end-around. If zero is the result, the message can be considered 


valid. Thus, checking a product coded message is exactly the 
same complexity as with the current TCP checksum! 
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In order to form Ms, the sender must multiply the multiple 


precision "number" Mo by 2**16 - 1. Or, Ms = (2**16)Mo - Mo. 
This is performed by shifting Mo one whole word’s worth of 
precision and subtracting Mo. Since carries must propagate 


between digits, but it is only the current digit which is of 
interest, one’s complement arithmetic is used. 


(2**16)Mo = MoO + Mol + Mo2 + ... + MoxX + 0 
- Mo = = (MOO. A -Mol -E disco eek es + MoxX) 
Ms = MsO + Msl +... — Mox 


A loop which implements this function on a PDP-11 might look 


like: 
LOOP: MOV -2(R2),RO ; Next byte of (2**16)Mo 
SBC RO ; Propagate carries from last SUB 
SUB (R2)+,RO0 ; Subtract byte of Mo 
MOV RO, (R3)+ ; Store in Ms 
SOB R1, LOOP ; Loop over entire message 


; 8 memory cycles per 16-bit byte 


Note that the coding procedure is not done in-place since it is 


not systematic. In general the original copy, Mo, will have to 
be retained by the sender for retransmission purposes and 
therefore must remain readable. Thus the MOV RO, (R3)+ is 


required which accounts for 2 of the 8 memory cycles per loop. 


The coding procedure will add exactly one 16-bit word to the 
message since Ms < (2**16)Mo . This additional 16 bits will be 
at the tail of the message, but may be moved into the defined 
location in the TCP header immediately before transmission. The 
receiver will have to undo this to put Ms back into standard 
format before decoding the message. 


The code in the receiver for fully decoding the message may be 
inferred by observing that any word in Ms contains the 
difference between two successive words of Mo minus the carries 
from the previous word, and the low order word contains minus the 
low word of Mo. So the low order (i.e., rightmost) word of Mr is 
just the negative of the low order byte of Ms. The next word of 
Mr is the next word of Ms plus the just computed word of Mr 
plus the carry from that previous computation. 


A slight refinement of the procedure is required in order to 
protect against an all-zero message passing to the destination. 


This will appear to have a valid checksum because Ms’/K = 0/K 
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= 0 with 0 remainder. The refinement is to make the coding be 
Ms = K*Mo + C where C is some arbitrary, well-known constant. 
Adding this constant requires a second pass over the message, but 
this will typically be very short since it can stop as soon as 
carries stop propagating. Chosing C = 1 is sufficient in most 
cases. 


The product code checksum must be evaluated in terms of the 
desired properties P1 - P7. It has been shown that a factor of 
two more machine cycles are consumed in computing or verifying a 
product code checksum (P5 satisfied?). 


Although the code is not systematic, the checksum can be verified 
quickly without decoding the message. If the Internet 
Destination Address is located at the least significant end of 
the packet (where the product code computation begins) then it is 
possible for a gateway to decode only enough of the message to 
see this field without having to decode the entire message. 


Thus, P6 is at least partially satisfied. The algebraic 
properties P1 through P4 are not satisfied, but only a small 
amount of computation is needed to account for this -- the 


message needs to be reformatted as previously mentioned. 


P7 is satisfied since the product code checksum can be 
incrementally updated to account for an added word, although the 
procedure is somewhat involved. Imagine that the original 
message has two halves, H1 and H2. Thus, Mo = H1*(R**j) + H2. 
The timestamp word is to be inserted between these halves to form 
a modified Mo’ = H1*(R**(j+1)) + T*(R**3) + H2. Since K has 


been chosen to be R-1, the transmitted message Ms’ = Mo’ (R-1). 
Then, 
Ms’ = Ms*R + T(R-1) (R**j) + P2((R-1) **2) 


Ms*R + T*(R**(j+1)) + T*(R**j) + P2*(R**2) - 2*P2*R - P2 


Recalling that R is 2**16, the word size on the PDP-11, 
multiplying by R means copying down one word in memory. So, 
the first term of Ms’ is simply the unmodified message copied 
down one word. The next term is the new data T added into the 
Ms’ being formed beginning at the (j+1)th word. The addition is 
fairly easy here since after adding in T all that is left is 
propagating the carry, and that can stop as soon as no carry is 
produced. The other terms can be handle similarly. 
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4. More Complicated Codes 


There exists a wealth of theory on error detecting and correcting 
codes. Peterson [6] is an excellent reference. Most of these 
"CRC" schemes are designed to be implemented using a shift 
register with a feedback network composed of exclusive-ORs. 
Simulating such a logic circuit with a program would be too slow 
to be useful unless some programming trick is discovered. 


One such trick has been proposed by Kirstein [8]. Basically, a 
few bits (four or eight) of the current shift register state are 
combined with bits from the input stream (from Mo) and the result 
is used as an index to a table which yields the new shift 
register state and, if the code is not systematic, bits for the 
output stream (Ms). A trial coding of an especially "good" CRC 
function using four-bit bytes showed showed this technique to be 
about four times as slow as the current checksum function. This 


was true for both the PDP-10 and PDP-11 machines. Of the 
desirable properties listed above, CRC schemes satisfy only P3 
(It has an inverse.), and P6 (It is systematic.). Placement of 


the checksum field in the packet is critical and the CRC cannot 
be incrementally modified. 


Although the bulk of coding theory deals with binary codes, most 
of the theory works if the alphabet contains q symbols, where 
q is a power of a prime number. For instance q taken as 2**16 
should make a great deal of the theory useful on a word-by-word 
basis. 


5. Outboard Processing 


When a function such as computing an involved checksum requires 
extensive processing, one solution is to put that processing into 
an outboard processor. In this way "encode message" and "decode 
message" become single instructions which do not tax the main 
host processor. The Digital Equipment Corporation VAX/780 
computer is equipped with special hardware for generating and 
checking CRCs [13]. In general this is not a very good solution 
since such a processor must be constructed for every different 
host machine which uses TCP messages. 


It is conceivable that the gateway functions for a large host may 
be performed entirely in an "Internet Frontend Machine". This 
machine would be responsible for forwarding packets received 


-9- 


Braden, Borman, & Partridge [Page 21] 


RFC 1071 Computing the Internet Checksum September 1988 


Internet Experiment Note 45 5 June 1978 
TCP Checksum Function Design William W. Plummer 


either from the network(s) or from the Internet protocol modules 
in the connected host, and for reassembling Internet fragments 
into segments and passing these to the host. Another capability 
of this machine would be to check the checksum so that the 
segments given to the host are known to be valid at the time they 
leave the frontend. Since computer cycles are assumed to be both 
inexpensive and available in the frontend, this seems reasonable. 


The problem with attempting to validate checksums in the frontend 
is that it destroys the end-to-end character of the checksum. If 
anything, this is the most powerful feature of the TCP checksum! 
There is a way to make the host-to-frontend link be covered by 
the end-to-end checksum. A separate, small protocol must be 
developed to cover this link. After having validated an incoming 
packet from the network, the frontend would pass it to the host 


saying "here is an Internet segment for you. Call it #123". The 
host would save this segment, and send a copy back to the 
frontend saying, "Here is what you gave me as #123. Is it OK?". 


The frontend would then do a word-by-word comparison with the 
first transmission, and tell the host either "Here is #123 
again", or "You did indeed receive #123 properly. Release it to 
the appropriate module for further processing." 


The headers on the messages crossing the host-frontend link would 
most likely be covered by a fairly strong checksum so that 
information like which function is being performed and the 
message reference numbers are reliable. These headers would be 
quite short, maybe only sixteen bits, so the checksum could be 
quite strong. The bulk of the message would not be checksumed of 
course. 

The reason this scheme reduces the computing burden on the host 
is that all that is required in order to validate the message 
using the end-to-end checksum is to send it back to the frontend 
machine. In the case of the PDP-10, this requires only 0.5 
memory cycles per 16-bit byte of Internet message, and only a few 
processor cycles to setup the required transfers. 


6. Conclusions 


There is an ordering of checksum functions: first and simplest is 
none at all which provides no error detection or correction. 
Second, is sending a constant which is checked by the receiver. 
This also is extremely weak. Third, the exclusive-OR of the data 
may be sent. XOR takes the minimal amount of computer time to 
generate and check, but is not a good checksum. A two’s 
complement sum of the data is somewhat better and takes no more 
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computer time to compute. Fifth, is the one’s complement sum 
which is what is currently used by TCP. It is slightly more 
expensive in terms of computer time. The next step is a product 
code. The product code is strongly related to one’s complement 
sum, takes still more computer time to use, provides a bit more 
protection against common hardware failures, but has some 
objectionable properties. Next is a genuine CRC polynomial code, 
used for checking purposes only. This is very expensive for a 
program to implement. Finally, a full CRC error correcting and 
detecting scheme may be used. 


For TCP and Internet applications the product code scheme is 
viable. It suffers mainly in that messages must be (at least 
partially) decoded by intermediate gateways in order that they 
can be forwarded. Should product codes not be chosen as an 
improved checksum, some slight modification to the existing 
scheme might be possible. For instance the "add and rotate" 
function used for paper tape by the PDP-6/10 group at the 
Artificial Intelligence Laboratory at M.I.T. Project MAC [12] 
could be useful if it can be proved that it is better than the 
current scheme and that it can be computed efficiently on a 
variety of machines. 
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